axiom 3
Defining the Scope of Learning Analytics: An Axiomatic Approach for Analytic Practice and Measurable Learning Phenomena
Takii, Kensuke, Liang, Changhao, Ogata, Hiroaki
Learning Analytics (LA) has rapidly expanded through practical and technological innovation, yet its foundational identity has remained theoretically under-specified. This paper addresses this gap by proposing the first axiomatic theory that formally defines the essential structure, scope, and limitations of LA. Derived from the psychological definition of learning and the methodological requirements of LA, the framework consists of five axioms specifying discrete observation, experience construction, state transition, and inference. From these axioms, we derive a set of theorems and propositions that clarify the epistemological stance of LA, including the inherent unobservability of learner states, the irreducibility of temporal order, constraints on reachable states, and the impossibility of deterministically predicting future learning. We further define LA structure and LA practice as formal objects, demonstrating the sufficiency and necessity of the axioms and showing that diverse LA approaches -- such as Bayesian Knowledge Tracing and dashboards -- can be uniformly explained within this framework. The theory provides guiding principles for designing analytic methods and interpreting learning data while avoiding naive behaviorism and category errors by establishing an explicit theoretical inference layer between observations and states. This work positions LA as a rigorous science of state transition systems based on observability, establishing the theoretical foundation necessary for the field's maturation as a scholarly discipline.
- North America > United States > New York (0.04)
- North America > United States > New Jersey (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Instructional Material (0.93)
- Research Report (0.82)
Differential privacy from axioms
Blanc, Guy, Pires, William, Pitassi, Toniann
Differential privacy (DP) is the de facto notion of privacy both in theory and in practice. However, despite its popularity, DP imposes strict requirements which guard against strong worst-case scenarios. For example, it guards against seemingly unrealistic scenarios where an attacker has full information about all but one point in the data set, and still nothing can be learned about the remaining point. While preventing such a strong attack is desirable, many works have explored whether average-case relaxations of DP are easier to satisfy [HWR13,WLF16,BF16,LWX23]. In this work, we are motivated by the question of whether alternate, weaker notions of privacy are possible: can a weakened privacy notion still guarantee some basic level of privacy, and on the other hand, achieve privacy more efficiently and/or for a substantially broader set of tasks? Our main result shows the answer is no: even in the statistical setting, any reasonable measure of privacy satisfying nontrivial composition is equivalent to DP. To prove this, we identify a core set of four axioms or desiderata: pre-processing invariance, prohibition of blatant non-privacy, strong composition, and linear scalability. Our main theorem shows that any privacy measure satisfying our axioms is equivalent to DP, up to polynomial factors in sample complexity. We complement this result by showing our axioms are minimal: removing any one of our axioms enables ill-behaved measures of privacy.
Clone-Resistant Weights in Metric Spaces: A Framework for Handling Redundancy Bias
Berriaud, Damien, Wattenhofer, Roger
We are given a set of elements in a metric space. The distribution of the elements is arbitrary, possibly adversarial. Can we weigh the elements in a way that is resistant to such (adversarial) manipulations? This problem arises in various contexts. For instance, the elements could represent data points, requiring robust domain adaptation. Alternatively, they might represent tasks to be aggregated into a benchmark; or questions about personal political opinions in voting advice applications. This article introduces a theoretical framework for dealing with such problems. We propose clone-proof representation functions as a solution concept. These functions distribute importance across elements of a set such that similar objects (``clones'') share (some of) their weights, thus avoiding a potential bias introduced by their multiplicity. Our framework extends the maximum uncertainty principle to accommodate general metric spaces and includes a set of axioms - symmetry, continuity, and clone-proofness - that guide the construction of representation functions. Finally, we address the existence of representation functions satisfying our axioms in the significant case of Euclidean spaces and propose a general method for their construction.
- North America > United States > New York (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- (3 more...)
Sampling Ex-Post Group-Fair Rankings
Gorantla, Sruthi, Deshpande, Amit, Louis, Anand
Randomized rankings have been of recent interest to achieve ex-ante fairer exposure and better robustness than deterministic rankings. We propose a set of natural axioms for randomized group-fair rankings and prove that there exists a unique distribution $D$ that satisfies our axioms and is supported only over ex-post group-fair rankings, i.e., rankings that satisfy given lower and upper bounds on group-wise representation in the top-$k$ ranks. Our problem formulation works even when there is implicit bias, incomplete relevance information, or only ordinal ranking is available instead of relevance scores or utility values. We propose two algorithms to sample a random group-fair ranking from the distribution $D$ mentioned above. Our first dynamic programming-based algorithm samples ex-post group-fair rankings uniformly at random in time $O(k^2\ell)$, where $\ell$ is the number of groups. Our second random walk-based algorithm samples ex-post group-fair rankings from a distribution $\delta$-close to $D$ in total variation distance and has expected running time $O^*(k^2\ell^2)$, when there is a sufficient gap between the given upper and lower bounds on the group-wise representation. The former does exact sampling, but the latter runs significantly faster on real-world data sets for larger values of $k$. We give empirical evidence that our algorithms compare favorably against recent baselines for fairness and ranking utility on real-world data sets.
- North America > United States > New York > New York County > New York City (0.05)
- Asia > India > Karnataka > Bengaluru (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
Mean Field Residual Networks: On the Edge of Chaos
We study randomly initialized residual networks using mean field theory and the theory of difference equations. Classical feedforward neural networks, such as those with tanh activations, exhibit exponential behavior on the average when propagating inputs forward or gradients backward. The exponential forward dynamics causes rapid collapsing of the input space geometry, while the exponential backward dynamics causes drastic vanishing or exploding gradients. We show, in contrast, that by adding skip connections, the network will, depending on the nonlinearity, adopt subexponential forward and backward dynamics, and in many cases in fact polynomial. The exponents of these polynomials are obtained through analytic methods and proved and verified empirically to be correct. In terms of the "edge of chaos" hypothesis, these subexponential and polynomial laws allow residual networks to "hover over the boundary between stability and chaos," thus preserving the geometry of the input space and the gradient information flow. In our experiments, for each activation function we study here, we initialize residual networks with different hyperparameters and train them on MNIST. Remarkably, our initialization time theory can accurately predict test time performance of these networks, by tracking either the expected amount of gradient explosion or the expected squared distance between the images of two input vectors. Importantly, we show, theoretically as well as empirically, that common initializations such as the Xavier or the He schemes are not optimal for residual networks, because the optimal initialization variances depend on the depth. Finally, we have made mathematical contributions by deriving several new identities for the kernels of powers of ReLU functions by relating them to the zeroth Bessel function of the second kind.